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SPECIFICATION 
BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

[0001] This invention relates to interactive, streaming or broadcast digital video coding, and 

in particular relates to video coding in compliance with the ITU-T Recommendation H.264. 

2. Description of the Related Art 

[0002] Broadcast television, home entertainment and on-line video streaming have been 

revolutionized and unified by various video compression technologies. The ISO/IEC MPEG- 
4 (part 2 visual) and ITU-T H.263 are standards that represent state-of-the-art video 
compression and decompression technology from circa 2000. In the late 1990's and in 
parallel with development of H.263 version 3, technical work for a successor to the H.263 
video coding standard began within the ITU-T's Video Coding Experts Group (VCEG). In 
December 2001, the MPEG video group together with VCEG formed a Joint Video Team 
(JVT) with the goal of leveraging the VCEG work to create a unified video coding standard. 
The JVT finished work on version 1 of the video coding standard known as ITU-T 
Recommendation H.264 and ISO/IEC 14496 10 AVC in 2003. The ITU-T Recommendation 
H.264 is available at 
http://www.itu.int/rec/recomm 
200305-P . It is hereby incorporated by reference. 

[0003] The new standard surpasses earlier video standards in terms of compression efficiency 

and resilience to data loss. The improved data compression offers advantages in terms of 
bandwidth usage. Specifically, given the same video source input, the video pictures 
reproduced after coding/decoding in compliance with H.264 typically have the same quality 
as the video pictures reproduced after coding/decoding in compliance with the H.263, 
MPEG-2 or MPEG-4 (part 2) video coding standards while using approximately half the 
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bandwidth. The many application areas likely to benefit include videoconferencing, video 
broadcast, streaming and video on mobile devices, telemedicine and distance learning. 

[0004] Even though H.264 represents a major breakthrough in the video compression 

technology, there are occasions, especially when coding at low data rates, when the video 
pictures reproduced after coding/decoding in compliance with the H.264 standard have visual 
artifacts. Some of those artifacts appear as localized light and dark regions (spanning a few 
pixels) located along borders or edges of fast-moving objects in the reproduced video 
pictures. The artifacts appear to make the edges "sparkle." Each pixel in the "sparkle" 
artifact is referred to as a "sparkle pixel." The artifacts appear more readily when the edges 
of the moving objects are oriented in certain directions. When the transmission bit rate is 
below a certain rate (dependent on the video source), the artifacts in the reproduced video 
pictures increase substantially, and may become very distracting. 

[0005] It is desirable to identify the causes of the sparkling artifacts so as to identify a 

method and an apparatus to improve the video quality in order to reduce or eliminate the 
distracting sparkling artifacts in the reconstructed video pictures. 
BRIEF SUMMARY OF THE INVENTION 

[0006] The present invention identifies the cause of the distracting sparkling artifacts in the 

video pictures processed in compliance with the H.264 video decoding standard described 
above. The cause stems from a deficiency in the 4x4 intra prediction process associated with 
prediction modes whose prediction directions are not in the same general direction as the 
video raster scan. A method according to the present invention identifies three problem 4x4 
intra prediction modes in the intra prediction process. Then, after decoding the bit stream 
according to the H.264 standard, the method applies a spatial filter, to specific regions in the 
decoded video pictures to significantly reduce or eliminate such artifacts. Many different 
filters may be used. A complimentary method of the present invention is to apply a different 
filter at another location in the video picture processing, e.g. before encoding the video 
pictures, such that the decoded pictures will have less or no distracting sparkling artifacts. 
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS 

[0007] A better understanding of the invention can be had when the following detailed 

description of the preferred embodiments is considered in conjunction with the following 
drawings, in which: 

[0008] Figure 1 depicts a sample 16x16 macroblock in a video picture. 
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[0009] Figure 2 depicts a sample of a 4x4 block to be predicted, which is within the 16x16 

marcoblock. 

[00010] Figures 3.0-3.8 depict the 9 different intra-block prediction modes available according 

toH.264. 

[00011] Figures 4.0-4.8 depict the 9 different results from the sample as in Figure 2, after the 

prediction modes are applied. 
[00012] Figure 5 depicts the mode directions of the 9 modes. 

DETAILED DESCRIPTION OF THE INVENTION 

[00013] Video pictures contain a very large amount of information. It is generally very 

cumbersome and even impractical to store or transmit raw video pictures. Raw video 
pictures are generally coded according to a certain scheme. The coded video pictures or bit 
streams are stored for later viewing or transmitted through a network for viewing at another 
location. When the video pictures need to be displayed, they are decoded in a reversed 
coding process and reproduced. H.264 standard defines one of the decoding schemes. A 
video codec here refers to the encoding and decoding of digital video pictures. A video 
processing scheme may have some features that are common among all video processing and 
some features that are peculiar to the scheme. For example, H.264 may have new and unique 
features, such as intra block prediction, but still uses common transform coding. It is 
discovered during the investigation that only the intra prediction process is related to the 
cause of the distracting sparkling artifacts, as will be discussed below. 

[00014] In encoding and decoding video pictures according to the H.264 standard or most 

other video coding standards, the video pictures are processed macroblock-by-macroblock 

and macroblock row by macroblock row. Specifically, the processing starts in the first 

macroblock row (top-left corner) and proceeds to the end of the row (top-right corner). The 

subsequent rows are processed in a similar fashion (left to right). This ordering is often 

referred to as a raster scan order or simply a raster scan (such as in an analog TV). It follows 

that in any neighborhood, a pixel to the left is reproduced before a pixel to the right and a 

pixel above is reproduced before the pixel below. Therefore, the values of pixels to the left 

or above are known before the values of the pixels to the right or below. For the convenience 

of discussion below, a direction from left to right horizontally is defined as the +X direction; 

a direction from top to bottom vertically is defined as +Y direction. A "first quadrant 

direction" is defined as a direction that lies between the +X direction and +Y direction, 
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including +X direction and +Y direction. More specifically, any direction that starts out as a 
+X direction, rotates clockwise until it overlaps with the +Y direction, is a "first quadrant 
direction." Any direction that starts out from the +Y direction (+Y direction not included), 
rotates clockwise until it overlaps with the -X direction (-X direction not included) is a 
"second quadrant direction." Any direction that starts out from -Y direction (-Y direction not 
included), rotates clockwise until it overlaps the +X direction (+X direction not included), is a 
"fourth quadrant direction." 

[00015] Two of the features in a H.264 compliant codec that improve the codec's coding 

efficiency are the inter-block prediction and the intra-block prediction. Each picture may 
have numerous pixels, which may be grouped into macroblocks and blocks. A macroblock 
represents a portion of a picture containing a 16x16 pixel region. A macroblock may be 
partitioned into sixteen (16) luma blocks each containing 4x4 luma pixels. Macroblocks may 
be coded with "inter-block" prediction, meaning that information from previously coded 
pictures is used to predict the content of the picture currently being coded. Macroblocks may 
also be coded with "intra-block" prediction, meaning that information from previously coded 
blocks in the same picture is used to predict the content of the block currently being coded. 
The intra-block prediction methods in H.264 are described in the H.264 standard, Section 8.3 
Intra Prediction Process. The particular mode of intra-block prediction used in coding a 
particular block is determined by the features of the particular block. The selected mode of 
intra-block prediction is indicated in the encoded video bit stream. A H.264 compliant codec 
can then decode the video pictures using the indicated mode of intra-block prediction and 
reproduce the video pictures. 

[00016] An H.264 compliant codec typically uses a Quantization Parameter (QP) to adjust the 

overall bit rate to achieve some target rate, i.e. channel bandwidth. The smaller the QP, the 
greater the amount of information conveyed in the encoded bit stream and the higher the 
corresponding bit rate. The larger the QP, the smaller the amount of information conveyed in 
the bit stream. Furthermore, the larger the QP, the poorer the quality of the decoded or 
reproduced video pictures. The choice of QP of a particular picture is related to, among other 
things, the complexity of the video picture, the degree of change between adjacent pictures, 
and the target bit rate. 

[00017] In Figures 1 and 2, a portion of a sample picture is shown. Figure 2 shows a 

macroblock at the lower-right corner to be coded using intra-prediction. The pixels in the 

portions at the top or the right are presumably available for prediction. 
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[00018] Figures 3.0-3.8 depict the nine (9) different 4x4 intra-block prediction modes 

according to the H.264 standard. The actual algorithm for each mode is specified and 
mandated in Section 8.3.1.2.1 through Section 8.3.1.2.9 of the H.264 standard, which are 
incorporated by reference. Figures 4.0-4.8 illustrate the reproduced block for each prediction 
mode. For a particular block, the encoder will select the prediction mode that minimizes the 
difference error (error between the actual block and the predicted block). The selected 
prediction mode is indicated in the coded video bit stream, along with the residual. The 
residual is computed by taking a pixel-by-pixel difference between the predicted block and 
the block in the original video picture that is being coded. In the above illustration Figures 
4.0-4.8, mode 7 (Fig. 4.7) is selected because the attributes of the block in the picture 
(bottom-left to top-right orientation) correspond to that particular prediction mode. 

[00019] Figure 5 further illustrates the prediction directions of the above mentioned nine (9) 

intra-block prediction modes. A prediction direction is the direction from the base pixels to 
the predicted pixels. For example, in mode 0 (vertical), the predicted pixel is determined by 
the pixels vertically above it. Therefore, the prediction direction is vertically downward (as 
indicated in Figure 3.0), or the +Y direction. The prediction direction of mode 0 is a "first 
quadrant direction." In mode 3 (diagonal down-left), the predicted pixel is determined by the 
pixels to the upper-right. Therefore, the prediction direction (as indicated in Figure 3.3 and 
Figure 5) is from upper-right to lower-left. The prediction direction of mode 3 is a "second 
quadrant direction." Similarly, the prediction directions of modes 0, 1, 4, 5 and 6 are "first 
quadrant directions." The prediction directions of modes 3 and 7 are "second quadrant 
directions." The prediction direction of mode 8 is a "fourth quadrant direction." In mode 2, 
the predicted pixel is the average of all pixels around the block. Therefore, mode 2 has no 
direction. 

[00020] During the investigation into the causes of the sparkling artifacts in H.264 decoded 

video pictures, it is observed that the artifacts appear more pronounced when the scenes being 
coded require the quality of the H.264 coded video picture to be significantly reduced in 
order to achieve some fixed target data rate (i.e., a higher quantization parameter is used). 

[00021] It is also discovered that the sparkling artifacts appear near the edges of fast-moving 

objects in the decoded and rendered video pictures. In the vicinity of the sparkling edges of 

objects in a video picture, it is further discovered that the artifacts appear in edges having an 

orientation of diagonal down-left (mode 3), vertical-left (mode 7) and horizontal-up (mode 8) 

as defined in the H.264 standard or as we have defined here, in the second or fourth quadrant 
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directions. The vicinity of an edge of an object in a picture is an area in the picture zero to 
several pixels (for example 4 pixels for pictures in a CIF format) away from the edge of the 
object. The vicinity of the edge may include the area both above/left to the edge, and 
down/right to the edge. When the sparkling artifacts appear near an edge of an object, 
appreciable amount of the total pixels in the vicinity of the edge are sparkle pixels. 

[00022] It is discovered that when modes 3, 7 or 8 are used, the resulting reproduced pictures 

have substantial sparkling artifacts. It is also discovered that these three prediction modes all 
have prediction directions that are second or fourth quadrant directions. The prediction 
directions of these three modes are, to some extent, against the direction of the raster scan. 
The other five 4x4 intra prediction modes (modes 0, 1, 4, 5 and 6) have prediction directions 
that are first-quadrant directions, which are generally in the same direction as the raster scan. 
The remaining prediction mode, mode 2, does not have an associated direction. 

[00023] Because the prediction directions in the three problem modes are not in the same 

general direction as the raster scan, some of the pixels, which would ideally be used to predict 
the pixels in the current block, are not available. Instead, pixels from non-optimal locations 
are used for prediction. Consequently, when a 4x4 intra prediction employs one of these 
three modes, the residual tends to be large (as a result of the poor prediction). If the residual 
is coded with too high a QP value, visually objectionable sparkling artifacts result. The high 
QP value may result when the available video bandwidth is insufficient to effectively code a 
video scene. 

[00024] Once the cause of the sparkling artifacts is identified to be the use of an intra-block 

prediction mode whose prediction direction is a second or fourth quadrant direction, a 
solution may be found. In the case of H.264, the cause of the sparkling artifacts is the use of 
three specific 4x4 intra-block prediction modes, i.e. modes 3, 7 and 8. A spatial filter is used 
to reduce the visibility of the sparkling artifacts. Specifically, the filter is designed to smooth 
the regions in the decoded video picture corresponding to the use of the three problem 4x4 
intra prediction modes. It may be beneficial to filter the entire 16x16 macroblock when one 
or more of its 4x4 blocks are identified as needing filtering. 

[00025] One simple way to reduce the artifact caused by the problem 4x4 intra prediction is to 

put a filter after decoding. In one embodiment of the current invention, a simple 3-tap linear 
Finite Impulse Response filter (FIR) may be used, which may be [1/4, 1/2, 1/4]. The 
overhead caused by this additional processing is typically small and the distracting sparkling 
artifacts are reduced significantly. To reduce the coding artifacts further, other more 
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sophisticated filters may be used, for example, a vertical filter (instead of a horizontal filter), 
a FIR filter with more taps , or a 2-dimensional FIR filter. 

[00026] During the decoding process, it is determined whether a given macroblock is 4x4 intra 

coded. If the macroblock is 4x4 intra-coded, then the specific prediction mode is identified. 
If the prediction mode happens to be one of the three problem modes, then an additional post- 
decoding step is performed prior to rendering the decoded video picture. Specifically, the 
decoded video picture is passed through a predetermined spatial filter designed to mitigate 
the distracting sparkling artifacts. In the embodiment discussed above, the spatial filter is a 
3-tap FIR with tap weights [1/4, 1/2, 1/4]. With the application of the spatial filter, 
substantial amount of the sparkle pixels in the reproduced pictures may be eliminated. The 
effectiveness of the sparkle pixel reduction process depends on the complexity of the scene 
being coded as well as the filter characteristics. In some cases, virtually all of the sparkling 
artifacts can be eliminated. 

[00027] Instead of using a post-decoding filter, a pre-encoding filter may be applied. The 

filter would be designed to eliminate or reduce the potential residual that would likely result 
in the selection of one of the three problem 4x4 intra block prediction modes during the 
encoding process. The filter would be designed such that it would only smooth in the vicinity 
of the boundaries or edges whose orientation corresponds to the second or fourth quadrant 
directions. The filter would have little or no effect on picture regions that do not have such 
boundaries or edges. After being modified (smoothed) by this pre-filter, the video picture 
could have features that are less likely to be encoded using one of the three problem intra- 
block prediction modes. Even if these predicted modes are chosen, the pre-filtering will 
result in a lower amount of residual energy (as compared to the coding of the non-filtered 
image) which in turn would reduce or eliminate the sparkling artifacts. The filtered video 
pictures will be encoded and decoded according to H.264 as usual. 

[00028] As indicated above, the sparkling artifacts are not problematic when there are enough 

bits to effectively code the residual. In this sense, the coding artifacts are related to the 
bandwidth, but are not entirely determined by the bandwidth. Some scenes, for example, a 
lecture with a single speaker and a plain background at CIF (352x288) resolution, coded at 
low bandwidths (e.g. 128Kbps) would not exhibit significant sparkling artifacts, because 
enough bits would be available to effectively code the residual. Other, more complex scenes 
such as a basketball game might exhibit the sparkling artifacts even at high bandwidths (e.g. 
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1 024 Kbps), because the complexity of the scene does not leave enough bits to adequately 
code the residual. 

[00029] It is discovered that QP is a good indicator of whether there will be significant 

sparkling artifacts in the reproduced video pictures. Specifically, if one of the 3 problem 
intra prediction modes is selected and the QP is large, the residual will most likely not be 
coded accurately enough to compensate for the poor prediction. In this regard, the QP value 
can be used to selectively apply the artifact reduction process. If the QP is below a given 
threshold, the sparkling artifacts may be insignificant and tolerable and the artifact reduction 
process is not needed. In this situation, the artifact reduction process is not activated. If the 
QP is above the threshold, then the artifact reduction process is needed. In this situation, the 
artifact reduction process is activated. The additional process for artifact reduction is 
inserted, after the decoding process, but before the displaying process, in the post-filter 
embodiment. The selection of a particular QP threshold is a design choice. It represents a 
balance between computational overhead, the picture sharpness and the amount of allowable 
coding artifacts. The threshold value of QP may vary depending on the type of video: 
smooth or highly detailed, stationary or full of motion. With respect to the H.264 codec, a 
QP threshold between 20 and 35 represents a good compromise. The QP threshold may also 
be adjusted depending on the reproduced pictures. 

[00030] The present invention is not limited to the improvement of video pictures processed 

by the H.264 codec. The present invention may be equally applicable to any picture 
processing where n x m intra-block prediction method is employed, especially in real-time 
video broadcasting or real-time video conferencing applications, n and m may be any 
integers which are greater than 1. The particular intra-block prediction mode selected to code 
a particular block is determined by the picture features of the block. For example, a border 
line in the direction of lower-left to upper-right in the block will mandate the selection of a 
mode with prediction direction of lower-left to upper-right, i.e. in the second quadrant 
direction or fourth quadrant direction. These features are not aligned with the raster scan 
direction. When a picture region is predicted using an intra-block prediction mode that has a 
prediction direction that is not in line with the raster scan direction, then the prediction is 
poor, i.e. the residual is large. When the QP is large, i.e. there are not enough bits to encode 
the residual, significant amount of visually distracting artifacts will appear. A suitable 
smoothing filter, either a pre-encoding filter or a post-decoding filter, may be added to reduce 
or eliminate these types of artifacts. 
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[00031] While illustrative embodiments of the invention have been illustrated and described, it 

will be appreciated by those skilled in the art that various changes can be made therein 
without departing from the spirit and scope of the invention. 
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